46 research outputs found

    Comparing Performance and Portability between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs

    Full text link
    The heterogeneous computing paradigm has led to the need for portable and efficient programming solutions that can leverage the capabilities of various hardware devices, such as NVIDIA, Intel, and AMD GPUs. This study evaluates the portability and performance of the SYCL and CUDA languages for one fundamental bioinformatics application (Smith-Waterman protein database search) across different GPU architectures, considering single and multi-GPU configurations from different vendors. The experimental work showed that, while both CUDA and SYCL versions achieve similar performance on NVIDIA devices, the latter demonstrated remarkable code portability to other GPU architectures, such as AMD and Intel. Furthermore, the architectural efficiency rates achieved on these devices were superior in 3 of the 4 cases tested. This brief study highlights the potential of SYCL as a viable solution for achieving both performance and portability in the heterogeneous computing ecosystem.Comment: This article was accepted for publication in 2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD

    Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment

    Full text link
    Background and objectives. The computational biology area is growing up over the years. The interest in researching and developing computational tools for the acquisition, storage, organization, analysis, and visualization of biological data generates the need to create new hardware architectures and new software tools that allow processing big data in acceptable times. In this sense, heterogeneous computing takes an important role in providing solutions but at the same time generates new challenges for developers in relation to the impossibility of porting source code between different architectures. Methods. Intel has recently introduced oneAPI, a new unified programming environment that allows code developed in the SYCL-based Data Parallel C++ (DPC++) language to be run on different devices such as CPUs, GPUs, and FPGAs, among others. Due to the large amount of CUDA software in the field of bioinformatics, this paper presents the migration process of the SW\# suite, a biological sequence alignment tool developed in CUDA, to DPC++ through the oneAPI compatibility tool dpc (recently renowned as SYCLomatic). Results. SW\# has been completely migrated with a small programmer intervention in terms of hand-coding. Moreover, it has been possible to port the migrated code between different architectures (considering different target platforms and vendors), with no noticeable performance degradation. Conclusions. The SYCLomatic tool presented a great performance-portability rate. SYCL and Intel oneAPI can offer attractive opportunities for the Bioinformatics community, especially considering the vast existence of CUDA-based legacy codes

    Monitoring and preliminary analysis of the natural responses recorded in a poorly accessible streambed spring located at a fluviokarstic gorge in Southern Spain

    Get PDF
    The analysis of natural responses (hydrodynamic, hydrothermal and hydrochemical) of karst springs is a well-established approach to provide insights into the hydrogeological functioning of the aquifers that they drain. However, a suitable monitoring program of these responses are often difficult to launch in poorly accessible streambed springs, due to the mixing between surface water and groundwater, in addition to topographic impediments. This work describes the installation procedure of the measurement equipment and the preliminary hydrogeological dataset collected at the Charco del Moro spring (Southern Spain) during one year. This outlet emerges 5 m below water surface, at the bottom of a partially flooded 20 - 200 m deep and 2 km long gorge, eroded by the Guadiaro River streamflow. It is considered the largest discharge point in the region, draining groundwater from northern nearby carbonate outcrops, although its catchment area is not established yet. Continuous (hourly) monitoring of electrical conductivity, water temperature, turbidity and water level (discharge) reflects a high degree of heterogeneity in the duality of groundwater flow and storage dynamics, which is typical of karst conduit flow systemsUniversidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    Customized Nios II multi-cycle instructions to accelerate block-matching techniques

    Get PDF
    This study focuses on accelerating the optimization of motion estimation algorithms, which are widely used in video coding standards, by using both the paradigm based on Altera Custom Instructions as well as the efficient combination of SDRAM and On-Chip memory of Nios II processor. Firstly, a complete code profiling is carried out before the optimization in order to detect time leaking affecting the motion compensation algorithms. Then, a multi-cycle Custom Instruction which will be added to the specific embedded design is implemented. The approach deployed is based on optimizing SOC performance by using an efficient combination of On-Chip memory and SDRAM with regards to the reset vector, exception vector, stack, heap, read/write data (.rwdata), read only data (.rodata), and program text (.text) in the design. Furthermore, this approach aims to enhance the said algorithms by incorporating Custom Instructions in the Nios II ISA. Finally, the efficient combination of both methods is then developed to build the final embedded system. The present contribution thus facilitates motion coding for low-cost Soft-Core microprocessors, particularly the RISC architecture of Nios II implemented in FPGA. It enables us to construct an SOC which processes 50Ă—50 @ 180 fps

    Smith-Waterman algorithm on heterogeneous systems: A case study

    Get PDF
    The well-known Smith-Waterman (SW) algorithm is a high-sensitivity method for local alignments. However, SW is expensive in terms of both execution time and memory usage, which makes it impractical in many applications. Some heuristics are possible but at the expense of losing sensitivity. Fortunately, previous research have shown that new computing platforms such as GPUs and FPGAs are able to accelerate SW and achieve impressive speedups. In this paper we have explored SW acceleration on a heterogeneous platform equipped with an Intel Xeon Phi coprocessor. Our evaluation, using the well-known Swiss-Prot database as a benchmark, has shown that a hybrid CPU-Phi heterogeneous system is able to achieve competitive performance (62.6 GCUPS), even with moderate low-level optimisations.Facultad de Informátic

    State-of-the-art in Smith-Waterman Protein Database Search on HPC Platforms

    Get PDF
    Searching biological sequence database is a common and repeated task in bioinformatics and molecular biology. The Smith–Waterman algorithm is the most accurate method for this kind of search. Unfortunately, this algorithm is computationally demanding and the situation gets worse due to the exponential growth of biological data in the last years. For that reason, the scientific community has made great efforts to accelerate Smith–Waterman biological database searches in a wide variety of hardware platforms. We give a survey of the state-of-the-art in Smith–Waterman protein database search, focusing on four hardware architectures: central processing units, graphics processing units, field programmable gate arrays and Xeon Phi coprocessors. After briefly describing each hardware platform, we analyse temporal evolution, contributions, limitations and experimental work and the results of each implementation. Additionally, as energy efficiency is becoming more important every day, we also survey performance/power consumption works. Finally, we give our view on the future of Smith–Waterman protein searches considering next generations of hardware architectures and its upcoming technologies.Instituto de Investigación en InformáticaUniversidad Complutense de Madri

    Evaluation of Intel's DPC++ Compatibility Tool in heterogeneous computing

    Get PDF
    The Intel DPC++ Compatibility Tool is a component of the Intel oneAPI Base Toolkit. This tool automatically transforms CUDA code into Data Parallel C++ (DPC++), thus assisting in the migration process. DPC++ is an implementation of the programming standard for heterogeneous computing known as SYCL, which unifies the development of parallel applications on CPUs, GPUs or even FPGAs. This paper analyzes the DPC++ Compatibility Tool by considering the manual intervention required and the problems encountered while migrating the Rodinia benchmarks. For this suite, this tool achieves an impressive rate of almost 87% for code successfully migrated. Moreover, a comparative study of the performance obtained by the migrated code was carried out, showing a moderate overhead in most of the migrated examples. Finally, a performance comparison on different devices was also performed

    Formation of stellar inner discs and rings in spiral galaxies through minor mergers

    Get PDF
    Recent observations show that inner disks and rings (IDs and IRs) are not preferentially found in barred galaxies, pointing to the relevance of formation mechanisms different to the traditional bar-origin scenario. Nevertheless, the role of minor mergers in the formation of these inner components (ICs), while often invoked, is still poorly understood. We have investigated the capability of minor mergers to trigger the formation of IDs and IRs in spiral galaxies through collisionless N-body simulations. Our models prove that minor mergers are an efficient mechanism to form rotationally-supported stellar ICs in spirals, neither requiring strong dissipation nor noticeable bars, and suggest that their role in the formation of ICs must have been much more complex than just bar triggering

    Smith-Waterman algorithm on heterogeneous systems: A case study

    Get PDF
    The well-known Smith-Waterman (SW) algorithm is a high-sensitivity method for local alignments. However, SW is expensive in terms of both execution time and memory usage, which makes it impractical in many applications. Some heuristics are possible but at the expense of losing sensitivity. Fortunately, previous research have shown that new computing platforms such as GPUs and FPGAs are able to accelerate SW and achieve impressive speedups. In this paper we have explored SW acceleration on a heterogeneous platform equipped with an Intel Xeon Phi coprocessor. Our evaluation, using the well-known Swiss-Prot database as a benchmark, has shown that a hybrid CPU-Phi heterogeneous system is able to achieve competitive performance (62.6 GCUPS), even with moderate low-level optimisations.Facultad de Informátic
    corecore